Unlock Unstructured Data

Hui Lin

2016-12-28

Outline

What is Unstructured Data?

HTML5 Icon

What is Unstructured Data?

Why?—It is everywhere

Why?—Practicalf arguments

Where to get the data?

API: Trump v.s Clinton Wikipedia View

Static Web: Wikipedia

Static Web: buzzfeed.com

Static Web: buzzfeed.com

Summary of packages

Automatic Data Pipeline

Data Analytics: Regular Expression

Data Analytics: Regular Expression

x <- c("here", "is", "P9929AMXT", "a", "P9703AM", "baby", 
    "P0506AM", "example", "P1197AM", "P1271AM")
idx <- grep("(^P)[[:digit:]]+", x)
x[idx]
## [1] "P9929AMXT" "P9703AM"   "P0506AM"   "P1197AM"   "P1271AM"

Data Analytics: Natural Language Processing

NLP: How does computer understand language?

Issue driven

HTML5 Icon

Issue driven

HTML5 Icon

HTML5 Icon

NLP: What are you interested in?

NLP: What are you interested in?

## [1] "English is a crazy language"
## [1] "English muffins"

Marketing Campaign: #yieldhero

#yieldhero Summary Statistics

When is the best time to tweet?

When is the best time to tweet?

Who to target?

Network

Who to target?

Products Mentioned

Table of Products

Shiny App Example

library(shiny)
runApp('Rcode/Shiny_NLP')

Recommendation for your work

Trick: robots.txt

Team up!

Data and Code